add mcp example to llama stack docs by davidwtf · Pull Request #168 · alauda/aml-docs

davidwtf · 2026-03-25T08:27:34Z

Summary by CodeRabbit

Documentation
- Install now targets vLLM OpenAI-compatible endpoints via VLLM_URL; VLLM_API_TOKEN is optional (example commented); manifest example uses in-cluster vLLM base URL.
- Default storage reduced from 20Gi to 1Gi.
- Quickstart condensed: adds two tool-integration paths (client-side and MCP/FastMCP), a shared agent flow, dual tool-calling selector, and improved init/cleanup and error handling.
- Added vLLM tool-calling guidance for KServe (container args and parser selection).
- Removed separate MCP-only notebook; integrated MCP content into main quickstart.

coderabbitai · 2026-03-25T08:27:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79eec8d1-c4ee-44b0-92dd-d600d76490e4

📥 Commits

Reviewing files that changed from the base of the PR and between d26eef2 and e807a06.

📒 Files selected for processing (1)

docs/public/llama-stack/llama-stack_quickstart.ipynb

🚧 Files skipped from review as they are similar to previous changes (1)

docs/public/llama-stack/llama-stack_quickstart.ipynb

Walkthrough

This PR updates Llama Stack docs and notebooks to replace third‑party LLM API secret guidance with vLLM OpenAI‑compatible endpoints (VLLM_URL), add vLLM/KServe tool‑calling guidance and predictor args, consolidate MCP examples into the main quickstart, and refactor the quickstart notebook to support dual tool integration paths (client-side and MCP).

Changes

Cohort / File(s)	Summary
Install & vLLM config `docs/en/llama_stack/install.mdx`	Replace provider API-secret guidance with `VLLM_URL` (example in-cluster `http://.../v1`), make `VLLM_API_TOKEN` optional (commented secret), reduce PVC from `20Gi` → `1Gi`, and add KServe/vLLM predictor container `args` and parser guidance for tool-calling (`--enable-auto-tool-choice`, `--tool-call-parser`).
Quickstart docs `docs/en/llama_stack/quickstart.mdx`	Update prerequisites to require `VLLM_URL`; condense steps and emphasize two tool integration paths (local `@client_tool` and MCP via FastMCP + `toolgroups.register`) with a shared agent flow and optional FastAPI deployment.
Main quickstart notebook (refactor) `docs/public/llama-stack/llama-stack_quickstart.ipynb`	Large refactor: add `TOOL_OPTION` selector, implement dual tool paths (client vs FastMCP), optional FastMCP bootstrap with readiness probing, conditional MCP toolgroup registration (`provider_id="model-context-protocol"`), shared `AGENT_TOOLS`, improved error handling and cleanup (terminate MCP process, clear `MCP_SERVER_URL`), add `fastmcp` dependency and troubleshooting cells.
Removed MCP‑only notebook `docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb`	Deleted standalone MCP quickstart notebook; its examples consolidated into the main quickstart notebook under conditional cells.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Notebook
  participant FastMCP
  participant Client as LlamaStackClient
  participant vLLM as vLLM/KServe
  rect rgba(200,230,255,0.5)
    Note over Notebook,FastMCP: TOOL_OPTION selects tool integration path
  end
  User->>Notebook: run notebook (choose TOOL_OPTION)
  alt Client-side tool
    Notebook->>Client: register local client tool (`@client_tool`) -> AGENT_TOOLS
    Client->>vLLM: send agent/model request (may include tool call)
    vLLM-->>Client: model response / tool invocation result
    Client-->>Notebook: stream results
  else MCP tool (FastMCP)
    Notebook->>FastMCP: spawn MCP server (background process)
    Notebook->>Client: register MCP toolgroup (provider_id=model-context-protocol) -> AGENT_TOOLS
    Client->>FastMCP: invoke MCP tool (tool metadata)
    FastMCP->>vLLM: forward/coordinate model calls
    vLLM-->>FastMCP: model response
    FastMCP-->>Client: tool result
    Client-->>Notebook: stream results
  end
  Notebook->>FastMCP: cleanup (terminate server)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

add llama-stack introduciton and simple usage #111: Overlaps the same Llama Stack install/quickstart docs and notebooks; likely touches identical files and scope.

Suggested reviewers

zhaomingkun1030
typhoonzero

Poem

🐰 I hopped through docs and notebooks bright,
Two tool paths tucked in one cozy light.
FastMCP hums, client tools sing,
Cleanup hops in at the end of the ring.
Cheers — a rabbit's joyful byte!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'add mcp example to llama stack docs' partially describes the changeset. While MCP is mentioned, the title does not capture the broader scope: the PR consolidates two separate MCP notebooks into one, restructures the quickstart guide, and updates installation documentation with vLLM configuration changes.	Consider a more descriptive title such as 'Consolidate MCP examples and update LlamaStack docs for vLLM integration' to better reflect the full scope of documentation changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/AI-23784

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.6)

docs/public/llama-stack/llama-stack_quickstart.ipynb

Unexpected end of JSON input

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

docs/public/llama-stack/llama-stack_quickstart.ipynb (3)

197-214: Consider adding a readiness check instead of a fixed sleep.

The time.sleep(1.5) at line 203 is a race condition—the MCP server may not be ready to accept connections yet, especially on slower systems. Consider polling the endpoint for readiness.

💡 Suggested improvement for server readiness

 mcp_process = Process(target=_run_mcp_weather_server, daemon=True)
 mcp_process.start()

 import socket
 import time

-time.sleep(1.5)
+# Wait for server to be ready (up to 10 seconds)
+for _ in range(20):
+    time.sleep(0.5)
+    try:
+        with socket.create_connection(("127.0.0.1", 8002), timeout=1):
+            break
+    except OSError:
+        pass

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/public/llama-stack/llama-stack_quickstart.ipynb` around lines 197 - 214,
Replace the fixed sleep that waits for the MCP server with an active readiness
poll: after starting mcp_process (the Process created with target
_run_mcp_weather_server), repeatedly attempt an HTTP GET to the MCP endpoint
(use MCP_SERVER_URL if set or construct http://{host}:8002/mcp like the current
logic) with a short timeout and small backoff, stop polling when you receive a
successful response (2xx) or after a reasonable overall timeout, and only then
set os.environ["MCP_SERVER_URL"] and print the success message; keep the same
host/port (/mcp) and error/timeout handling so the notebook doesn’t hang
indefinitely.

289-298: Redundant import and client initialization.

LlamaStackClient is imported and client is initialized earlier at lines 91-96. This cell re-imports LlamaStackClient and Agent, but then uses the existing client variable from the earlier cell. Consider removing the redundant import or clarifying that cells are designed to be run independently.

Note: If the intent is cell independence for users who skip Option A/B setup cells, this is acceptable but could be documented.

💡 Suggested simplification

 import os
-from llama_stack_client import LlamaStackClient, Agent
+from llama_stack_client import Agent


 if "AGENT_TOOLS" not in globals():
     raise RuntimeError("AGENT_TOOLS is missing. Run the matching setup cell(s) in section 2 for the selected TOOL_OPTION.")

 base_url = os.getenv("LLAMA_STACK_URL", "http://localhost:8321")
 print(f"Connecting to Server: {base_url}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/public/llama-stack/llama-stack_quickstart.ipynb` around lines 289 - 298,
The cell re-imports LlamaStackClient and Agent and references the
previously-created client, causing redundancy or confusion; either remove the
duplicate imports and rely on the earlier client variable (keep AGENT_TOOLS and
base_url handling) or make the cell self-contained by checking/creating a client
when missing (e.g., test for existing client variable and call
LlamaStackClient(...) to initialize it) so references to LlamaStackClient,
Agent, client, AGENT_TOOLS and base_url are consistent; update cell text to
document the chosen approach (dependent vs standalone cells).

205-210: IP detection may be unreliable in containerized environments.

The socket.gethostbyname(socket.gethostname()) approach may return unexpected results in containers or when multiple network interfaces exist. The fallback to MCP_SERVER_HOST env var is good, but consider documenting this limitation more prominently or defaulting to 127.0.0.1 with a clearer note.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/public/llama-stack/llama-stack_quickstart.ipynb` around lines 205 - 210,
The IP autodetection using socket.gethostbyname(socket.gethostname()) to set
MCP_SERVER_URL is unreliable in containers/multi-interface hosts; update the
logic around MCP_SERVER_URL so it prefers an explicit MCP_SERVER_HOST env var
(MCP_SERVER_HOST) or defaults to 127.0.0.1 when MCP_SERVER_HOST is missing, and
add a brief inline comment or documentation note next to the MCP_SERVER_URL
assignment explaining the container/network limitation and instructing users to
set MCP_SERVER_HOST when running inside containers; reference the MCP_SERVER_URL
assignment and the socket.gethostbyname/socket.gethostname calls when making the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/public/llama-stack/llama-stack_quickstart.ipynb`:
- Around line 197-214: Replace the fixed sleep that waits for the MCP server
with an active readiness poll: after starting mcp_process (the Process created
with target _run_mcp_weather_server), repeatedly attempt an HTTP GET to the MCP
endpoint (use MCP_SERVER_URL if set or construct http://{host}:8002/mcp like the
current logic) with a short timeout and small backoff, stop polling when you
receive a successful response (2xx) or after a reasonable overall timeout, and
only then set os.environ["MCP_SERVER_URL"] and print the success message; keep
the same host/port (/mcp) and error/timeout handling so the notebook doesn’t
hang indefinitely.
- Around line 289-298: The cell re-imports LlamaStackClient and Agent and
references the previously-created client, causing redundancy or confusion;
either remove the duplicate imports and rely on the earlier client variable
(keep AGENT_TOOLS and base_url handling) or make the cell self-contained by
checking/creating a client when missing (e.g., test for existing client variable
and call LlamaStackClient(...) to initialize it) so references to
LlamaStackClient, Agent, client, AGENT_TOOLS and base_url are consistent; update
cell text to document the chosen approach (dependent vs standalone cells).
- Around line 205-210: The IP autodetection using
socket.gethostbyname(socket.gethostname()) to set MCP_SERVER_URL is unreliable
in containers/multi-interface hosts; update the logic around MCP_SERVER_URL so
it prefers an explicit MCP_SERVER_HOST env var (MCP_SERVER_HOST) or defaults to
127.0.0.1 when MCP_SERVER_HOST is missing, and add a brief inline comment or
documentation note next to the MCP_SERVER_URL assignment explaining the
container/network limitation and instructing users to set MCP_SERVER_HOST when
running inside containers; reference the MCP_SERVER_URL assignment and the
socket.gethostbyname/socket.gethostname calls when making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a79c809c-a938-492e-9d3a-875a7b08b56f

📥 Commits

Reviewing files that changed from the base of the PR and between 8e7f560 and 9d54906.

📒 Files selected for processing (4)

docs/en/llama_stack/install.mdx
docs/en/llama_stack/quickstart.mdx
docs/public/llama-stack/llama-stack_quickstart.ipynb
docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb

💤 Files with no reviewable changes (1)

docs/public/llama-stack/llama-stack_quickstart_mcp.ipynb

cloudflare-workers-and-pages · 2026-03-25T08:35:27Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`e807a06`
Status:	✅ Deploy successful!
Preview URL:	https://2e569e01.alauda-ai.pages.dev
Branch Preview URL:	https://fix-ai-23784.alauda-ai.pages.dev

View logs

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/public/llama-stack/llama-stack_quickstart.ipynb`:
- Around line 233-240: The code currently overwrites
os.environ["MCP_SERVER_URL"] without preserving the caller's original value;
capture the original value (e.g., orig = os.environ.get("MCP_SERVER_URL"))
before setting MCP_SERVER_URL and set os.environ["MCP_SERVER_URL"] to the
computed value, then in the corresponding cleanup/teardown section restore the
environment to its prior state (if orig is None delete the key, otherwise set it
back to orig); apply the same capture-and-restore pattern for the other
occurrence referenced (the block around the later cleanup).
- Around line 104-106: The notebook can reuse a stale AGENT_TOOLS when
TOOL_OPTION is switched; update the TOOL_OPTION selector cell to reset/clear
AGENT_TOOLS (e.g., set AGENT_TOOLS = None or {}) whenever TOOL_OPTION is
assigned, and add a validation immediately before the agent construction step to
assert AGENT_TOOLS matches the chosen TOOL_OPTION (or raise an explicit error)
so reruns fail fast and the wrong backend is never used.
- Around line 114-116: The shared FastAPI test flow calls requests.post(...) but
the requests import is currently inside the Option A-only cell (which may be
skipped); move the requests import out of the Option A-only cell into a
common/non-optional setup cell (or add an explicit requests import in the API
test cell) so requests is always defined for the shared flow; update references
around get_weather and AGENT_TOOLS cells accordingly to ensure requests is
available regardless of TOOL_OPTION.
- Around line 183-241: The notebook spawns multiprocessing.Process with a nested
target (_run_mcp_weather_server) which is not picklable on spawn-based kernels
and uses a brittle time.sleep(5) readiness wait; move the server entrypoint out
of the cell into a top-level, importable function (e.g., module-level
run_mcp_weather_server) or launch the FastMCP server via subprocess instead of
Process, then replace the fixed sleep with an explicit readiness poll against
the MCP endpoint (MCP_SERVER_URL) with retry/backoff and timeout before setting
os.environ and printing; update usages of mcp_process, get_weather_mcp, and the
Process creation to reference the new top-level entrypoint.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d37c39e0-1029-46c4-8892-583e0d623e9e

📥 Commits

Reviewing files that changed from the base of the PR and between 9d54906 and 6e2721f.

📒 Files selected for processing (2)

docs/en/llama_stack/install.mdx
docs/public/llama-stack/llama-stack_quickstart.ipynb

🚧 Files skipped from review as they are similar to previous changes (1)

docs/en/llama_stack/install.mdx

docs/public/llama-stack/llama-stack_quickstart.ipynb

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/public/llama-stack/llama-stack_quickstart.ipynb`:
- Around line 255-261: Wrap the call to
socket.gethostbyname(socket.gethostname()) in a try/except that catches
socket.gaierror and socket.herror (and optionally a broad Exception) and on
exception fall back to using os.getenv("MCP_SERVER_HOST", "127.0.0.1") for
_host; then proceed to build MCP_SERVER_URL as before (look for the
MCP_SERVER_URL assignment and the _host logic around socket.gethostbyname to
apply this change).
- Around line 296-309: AGENT_TOOLS is using an invalid dict format for MCP;
after calling client.toolgroups.register(...) you must fetch the actual tool
objects or identifiers and pass those to Agent(tools=...), e.g. call
client.tools.list(toolgroup_id=toolgroup_id) and assign AGENT_TOOLS to that
returned list (or build a list of explicit MCP identifiers like
["mcp::filesystem::read_file"]) so Agent(tools=AGENT_TOOLS) receives valid tool
entries instead of the current {"type":..., "server_label":...,
"server_url":...} structure.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3519d066-0229-46b8-8681-c7fd9604a6a6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e2721f and d26eef2.

📒 Files selected for processing (1)

docs/public/llama-stack/llama-stack_quickstart.ipynb

docs/public/llama-stack/llama-stack_quickstart.ipynb

xZhizhizhizhizhi · 2026-03-26T08:35:57Z

/test-pass

add mcp example to llama stack docs

9d54906

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

update

6e2721f

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

update

d26eef2

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

docs/public/llama-stack/llama-stack_quickstart.ipynb Show resolved Hide resolved

docs/public/llama-stack/llama-stack_quickstart.ipynb Show resolved Hide resolved

update

e807a06

davidwtf merged commit e1f395f into master Mar 26, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add mcp example to llama stack docs#168

add mcp example to llama stack docs#168
davidwtf merged 4 commits intomasterfrom
fix/AI-23784

davidwtf commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

cloudflare-workers-and-pages bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

xZhizhizhizhizhi commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davidwtf commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying alauda-ai with Cloudflare Pages

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xZhizhizhizhizhi commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidwtf commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 25, 2026 •

edited

Loading